LNCC

OPTIMAL EXECUTION OF SCIENTIFIC WORKFLOWS IN IN-MEMORY DATAFLOW FRAMEWORKS

Tipo de evento:
Exame de Qualificação

The volume of data produced by scientific simulations and experiments has been increasing in an astronomical rate. Normally, scientific applications consuming high volume of data are defined as workflows. There have been a huge progress in parallel execution of scientific workflows in shared-nothing clusters. However, most of the current Scientific Workflows Management Systems do not handle the memory and data locality appropriately. Apache Spark deals with these issues by chaining activities that should be done locally, among other optimizations such as the in-memory storage of intermediate data and caching of pre-computed values. Spark requires existing workflows to be described using its own API, which forces the activities to be implemented in Python, Java, Scala or R, to take advantage of the RDD, a memory-based storage that allows Spark to execute a chain of activities efficiently.

In this qualification proposal, we describe a project to develop a Scientific Workflow Management System called TARDIS, whose objective is to run existing workflows (e.g. designed for Pegasus or Chiron) inside a Spark cluster, using RDDs and smart caching, in a completely transparent way for the user.

Data Início: 22/11/2016
Hora: 13:30
Data Fim: 22/11/2016
Hora: 15:30

Local: LNCC - Laboratório Nacional de Computação Ciêntifica - Auditorio B

Aluno:
Daniel Gaspar Gonçalves de Souza - Universidade Católica de Petrópolis - UCP

Orientador:
Fabio André Machado Porto - Laboratório Nacional de Computação Científica - LNCC

Participante Banca Examinadora:
Antônio Tadeu Azevedo Gomes - Laboratório Nacional de Computação Científica - LNCC
Artur Ziviani - Laboratório Nacional de Computação Científica - LNCC
Daniel de Oliveira - -

Últimas eventos

O LNCC

Coordenações

Pesquisa e Desenvolvimento

Supercomputador SDUMONT - Computação de Alto Desempenho

Programas Nacionais

Inovação

Programas Acadêmicos

Eventos

Biblioteca

Acesso à Informação

EVENTO

OPTIMAL EXECUTION OF SCIENTIFIC WORKFLOWS IN IN-MEMORY DATAFLOW FRAMEWORKS

Principal

Acesso à Informação

Serviços

Redes Sociais

Navegação